Skip to content

Phase 1b: MSE-optimal bandwidth selector (CCF 2018 DPI)#335

Merged
igerber merged 17 commits intomainfrom
did-no-untreated
Apr 19, 2026
Merged

Phase 1b: MSE-optimal bandwidth selector (CCF 2018 DPI)#335
igerber merged 17 commits intomainfrom
did-no-untreated

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 19, 2026

Summary

  • Add diff_diff.mse_optimal_bandwidth + BandwidthResult dataclass — the Phase 1b MSE-optimal bandwidth selector for the HAD (Heterogeneous Adoption Design) nonparametric estimator.
  • Port nprobust 0.5.0 (SHA 36e4e53) lpbwselect(bwselect="mse-dpi") into diff_diff/_nprobust_port.py (~500 lines): kernel_W, qrXXinv, lprobust_res, lprobust_vce, lprobust_bw, lpbwselect_mse_dpi. Three-stage DPI with four nested lprobust.bw calls at orders q+1, q+2, q, p.
  • Parity harness: benchmarks/R/generate_nprobust_golden.R produces benchmarks/data/nprobust_mse_dpi_golden.json with every stage bandwidth + per-stage (V, B1, B2, R). Python port matches R at 0.0000% relative error on all five stage bandwidths across three deterministic DGPs (uniform, Beta(2,2), half-normal).
  • 149 tests pass (67 new for Phase 1b + 82 existing local-linear regression). Public API scope is intentionally restricted to the HAD configuration (vce="nn", nnmatch=3, p=1, deriv=0, interior=False); the broader port surface is available via _nprobust_port for Phase 1c.
  • Phase 1b checkbox ticked in docs/methodology/REGISTRY.md with explicit notes about weights= being unsupported (no nprobust parity anchor) and the public API scope restriction.

Methodology references (required if estimator / math changes)

  • Method name(s): Calonico-Cattaneo-Farrell (2018) MSE-DPI direct-plug-in bandwidth selector for local-polynomial boundary estimation; used by the HeterogeneousAdoptionDiD (HAD) estimator of de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026, arXiv:2405.04465v6) Design 1'.
  • Paper / source link(s):
    • Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2018). On the effect of bias estimation on coverage accuracy in nonparametric inference. JASA 113(522), 767-779.
    • Reference implementation: nprobust 0.5.0 CRAN package (GitHub SHA 36e4e532d2f7d23d4dc6e162575cca79e0927cda, github.com/nppackages/nprobust); the R source at npfunctions.R:187-288 (lprobust.bw) and :498-607 (lpbwselect.mse.dpi) is cited inline in the Python port.
    • de Chaisemartin et al. (2026) — docstring cross-references; registry entry at docs/methodology/REGISTRY.md#heterogeneousadoptiondid.
  • Any intentional deviations from the source (and why):
    • weights= raises NotImplementedError in the public wrapper. nprobust::lpbwselect has no weight argument, so there is no parity anchor. Deferred to Phase 2 (survey-design adaptation). Documented in REGISTRY and docstring.
    • Public API scope restricted to HAD Phase 1b (vce="nn", nnmatch=3, p=1, deriv=0, interior=False are hard-coded in mse_optimal_bandwidth). The underlying port supports a broader surface but those paths are not parity-tested. Callers needing the broader surface go through diff_diff._nprobust_port.lpbwselect_mse_dpi. Documented in REGISTRY and docstring; pinned by a behavioral test.

Validation

  • Tests added/updated:
    • tests/test_nprobust_port.py (NEW, 14 tests) — unit tests for every ported helper (kernel, Cholesky-inverse, NN residuals, clustered/unclustered meat, single-stage lprobust_bw parity).
    • tests/test_bandwidth_selector.py (NEW, 78 tests via parametrize) — 5-bandwidth parity on 3 DGPs at 1% + per-stage (V, B1, B2) parity + validation (shapes, finiteness, unknown kernel, weights, bwcheck range) + API-scope pin test + kernel-dispatch + boundary + downstream-integration + rate-scaling (G^{-1/5} MC).
    • tests/test_local_linear.py — unchanged; still green as regression anchor for Phase 1a kernels.
  • Backtest / simulation / notebook evidence (if applicable): N/A — Phase 1b ships kernel machinery only; simulation / replication harness (DGPs 1-3 coverage) is Phase 4.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes.

🤖 Generated with Claude Code

igerber and others added 3 commits April 19, 2026 13:19
Port nprobust 0.5.0 (SHA 36e4e53) lpbwselect(bwselect="mse-dpi") in-house
as diff_diff.mse_optimal_bandwidth + BandwidthResult, backed by private
diff_diff._nprobust_port module (kernel_W, lprobust_bw, lpbwselect_mse_dpi).
Three-stage DPI with four lprobust.bw calls at orders q+1, q+2, q, p.

Parity verified at 0.0000% on all five stage bandwidths (c_bw, bw_mp2,
bw_mp3, b_mse, h_mse) across three deterministic DGPs (uniform, Beta(2,2),
half-normal) via benchmarks/R/generate_nprobust_golden.R.

weights= raises NotImplementedError (no nprobust parity anchor); deferred
to Phase 2. vce='nn' is the only verified variant; hc0/hc1/hc2/hc3 paths
are implemented but untested.

144 tests pass (32 new bandwidth-selector/port + 55 existing local_linear
regression + others). Plan at ~/.claude/plans/vectorized-beaming-feather.md
was approved with 1% parity target; actual port achieves essentially exact
parity.

Phase 1b checkbox ticked in docs/methodology/REGISTRY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 fixes from local AI review:
- Add bwcheck validation in lpbwselect_mse_dpi: reject bwcheck < 1 and
  bwcheck > N with clear ValueError instead of IndexError inside the
  sorted-abs indexing.
- Export BandwidthResult and mse_optimal_bandwidth from
  diff_diff/__init__.py::__all__ so the public API propagates through
  star imports and introspection.
- Fix contradictory docstring in lpbwselect_mse_dpi: the HAD case
  (p=1, deriv=0) has even=False, so the R source dispatches to the
  closed-form C$bw branch, NOT to optimize(). Docstring now matches
  the actual control flow.
- 4 new tests in test_bandwidth_selector covering bwcheck > N,
  bwcheck=0, bwcheck=-1, and bwcheck=None on a 5-obs sample.

148 tests pass (up from 144).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AI review flagged that the public wrapper silently hard-codes vce="nn",
nnmatch=3, p=1, deriv=0, interior=False while the underlying port
supports a broader surface. Two options: expose the extra parameters
or document the restriction. Chose to document since Phase 1b plan
explicitly scopes those paths as untested / deferred.

- mse_optimal_bandwidth docstring now has a "Public API scope" section
  describing the HAD Phase 1b restriction and pointing users who need
  the broader surface at diff_diff._nprobust_port.lpbwselect_mse_dpi.
- REGISTRY.md Phase 1b entry gains a "Note (public API scope
  restriction)" documenting the same contract.
- New test test_public_wrapper_fixes_vce_nn_nnmatch_3 pins the
  restriction: the wrapper's output must equal the explicit port call
  with vce="nn", nnmatch=3, p=1, deriv=0, interior=False. Any future
  scope expansion now fails the test and forces a REGISTRY/docstring
  update in lockstep.

150 tests pass (up from 148).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment
⚠️ Needs changes — I found two unmitigated P1 issues in the new bandwidth-selector path: one methodology applicability check is missing, and one new edge-case path still falls through to opaque linear-algebra failures.

Executive Summary

  • P1: mse_optimal_bandwidth() is documented and wired as a boundary-only Phase 1b API, but it never verifies that the supplied boundary is actually the lower support edge. Interior-point or upper-boundary calls can therefore use the wrong selector branch or return a bandwidth that is incompatible with the one-sided fitter.
  • P1: the DPI port never checks that each stage has enough in-window observations / full-rank design before calling Cholesky inversion. Small samples or mispositioned boundaries can still fail with opaque LinAlgErrors.
  • P3: the tiny-sample test explicitly tolerates arbitrary failures instead of pinning the intended user-facing error contract, and the module docstring still says Phase 1b “will add” the selector.
  • The documented deviations in the registry (weights= unsupported; broader _nprobust_port surface not parity-tested outside HAD Phase 1b) are properly disclosed and are not defects.

Methodology

  • Severity P1. Impact: the exported wrapper is explicitly scoped to the HAD boundary case, but it accepts any numeric boundary and always forwards interior=False. The methodology registry says the nonparametric HAD path is for the support infimum (0 in Design 1', min D_2 in the continuous-near- variant), and the local fitter itself is one-sided to the right of boundary. As written, an interior point or upper-boundary input silently runs the boundary selector anyway. lpbwselect itself exposes interior as a distinct mode, and the R mse-dpi branch switches behavior based on that flag. Refs: diff_diff/local_linear.py, diff_diff/local_linear.py, diff_diff/local_linear.py, docs/methodology/REGISTRY.md, docs/methodology/REGISTRY.md, docs/methodology/REGISTRY.md, docs/methodology/REGISTRY.md. (rdrr.io)
    Concrete fix: in mse_optimal_bandwidth(), enforce the Phase 1b lower-boundary contract explicitly (boundary at d.min() within tolerance, and no observations below it), or reject non-boundary / upper-boundary / two-sided-support inputs with a clear ValueError until a documented interior API exists.

Code Quality

  • Severity P1. Impact: each DPI stage builds its active-window design matrix and sends it directly into qrXXinv() with no explicit support-size or rank check. That means tiny samples, bwcheck=None, or boundary/support misconfiguration still leak opaque Cholesky failures instead of a clear validation error. The test suite currently acknowledges that gap by only asserting “not IndexError” on a tiny-sample call. Refs: diff_diff/_nprobust_port.py, diff_diff/_nprobust_port.py, diff_diff/_nprobust_port.py, diff_diff/_nprobust_port.py, tests/test_bandwidth_selector.py. bwcheck only enlarges the selected bandwidth when it is provided; setting it to None removes that safeguard entirely. (rdrr.io)
    Concrete fix: before each qrXXinv() call, validate that the active window has at least the required number of observations and full column rank (n_V >= o+1, n_B1 >= o_B+1, n_B2 >= o_B+2, plus rank checks). Raise a targeted ValueError naming the failing stage and suggesting larger G, a valid lower boundary, or a non-None bwcheck.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No separate deferred-work item applies here. These two P1 issues are not tracked in TODO.md, so they remain unmitigated.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: diff_diff/local_linear.py still says Phase 1b “will add” the selector, and tests/test_bandwidth_selector.py intentionally accepts any non-IndexError failure on tiny samples. Neither is a blocker by itself, but both weaken the regression contract around the new public API.
    Concrete fix: update the module docstring to reflect that Phase 1b shipped, and once the P1 validation is added, replace the catch-all tiny-sample test with explicit pytest.raises(ValueError, match=...) checks. Also add rejection tests for interior and upper-boundary inputs.

Path to Approval

  1. Add a boundary-applicability check in mse_optimal_bandwidth() that rejects interior-point, upper-boundary, or two-sided-support inputs for the exported Phase 1b API, and add regression tests for boundary=0.5 on U(0,1) data and boundary=d.max().
  2. Add explicit per-stage support/rank guards before qrXXinv() in lprobust_bw() (or a wrapper-level preflight), and update the tiny-sample test to require a clear ValueError on bwcheck=None instead of permitting arbitrary failures.

P1 #1 (methodology): mse_optimal_bandwidth now rejects boundary > d.min()
with a clear ValueError. The Phase 1b wrapper is scoped to the HAD
lower-boundary case (Design 1' with d_0 = 0 or Design 1 continuous-near-
d_lower with d_0 = min D_2). Interior or upper-boundary inputs would
silently run the boundary selector with a symmetric kernel and return
a bandwidth incompatible with the one-sided fitter. The port remains
available for interior / broader surface via
_nprobust_port.lpbwselect_mse_dpi.

P1 #2 (code quality): lprobust_bw validates in-window observation
counts at each of the three local-poly fits before calling qrXXinv:
  - variance: n_V >= o+1
  - B1: n_B1 >= o_B+1
  - B2: n_B2 >= o_B+2
Each guard raises a targeted ValueError naming the failing stage, the
bandwidth, and suggested remediation. Previously these failed with
opaque LinAlgError from Cholesky on under-determined designs.

P3 (doc): local_linear.py module docstring updated to say Phase 1b
"ships" instead of "will add"; tiny-sample test now asserts the new
ValueError contract instead of accepting any non-IndexError failure.

New behavioral tests:
- test_interior_boundary_rejected: boundary=0.5 on U(0,1) rejected
- test_upper_boundary_rejected: boundary=d.max() rejected
- test_boundary_equal_to_min_d_accepted: boundary=min(d) accepted
  (Design 1 continuous-near-d_lower path)
- test_boundary_below_min_d_accepted: boundary=0 with d.min()>0
  accepted (Design 1' path)
- test_bwcheck_none_on_tiny_sample_raises_valueerror: upgraded from
  "catch anything non-IndexError" to pytest.raises(ValueError,
  match="lprobust_bw").

153 tests pass (up from 149).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5799d427d7eb54eec17104df07d854bd7cb4df65


Overall Assessment

⚠️ Needs changes — the prior interior/upper-boundary P1 is fixed, but I still see two unmitigated P1 issues in the new Phase 1b bandwidth-selector path.

Executive Summary

  • The previous boundary-applicability finding is resolved: the wrapper now rejects interior and upper-boundary inputs, and the tests pin that behavior in diff_diff/local_linear.py:L608-L634 and tests/test_bandwidth_selector.py:L267-L304.
  • The previous tiny-sample failure is only partially resolved: support-count guards were added, but full-rank validation is still missing, so duplicate-support windows can still leak raw Cholesky failures out of qrXXinv().
  • Severity P1: mse_optimal_bandwidth() still accepts lower-boundary mass-point designs even though the Methodology Registry assigns d̲ > 0 mass-point cases to the 2SLS/sample-average path, not the nonparametric CCF bandwidth selector.
  • The documented deviations (weights= unsupported; broader _nprobust_port surface not parity-tested) are properly disclosed in the registry and are not defects.
  • Severity P3: the test suite claims per-stage (V, B1, B2, R) parity, but it only asserts V/B1/B2; the nonzero-R stages are not actually pinned.

Methodology

Affected methods: Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selection for local-polynomial boundary estimation, as used in the HAD nonparametric Equation 7/8 path.

  • Severity P1. Impact: the new public wrapper enforces “lower boundary” but not “continuous near the lower boundary.” The registry explicitly separates the d̲ > 0 mass-point case into a different estimator family: sample averages / 2SLS with instrument 1{D_2 > d̲}, not the nonparametric CCF local-polynomial path. As written, mse_optimal_bandwidth() accepts boundary = min(d) and dispatches straight into lpbwselect_mse_dpi with no check for bunching at , so a lower-boundary mass-point design can silently receive a nonparametric bandwidth for the wrong methodology. See diff_diff/local_linear.py:L517-L537, diff_diff/local_linear.py:L608-L634, docs/methodology/REGISTRY.md:L2172-L2179, docs/methodology/REGISTRY.md:L2228-L2230, and docs/methodology/REGISTRY.md:L2246-L2249.
    Concrete fix: before calling lpbwselect_mse_dpi, detect nontrivial mass at min(d) for the d̲ > 0 case (the registry already plans a >2% modal-min rule) and raise a clear NotImplementedError / ValueError directing users to the Design 1 mass-point 2SLS path until Phase 2 ships.

  • Severity P3. Impact: none. The two intentional deviations called out in the PR body are properly documented in the registry: weights= is unsupported, and the broader private _nprobust_port surface is not parity-tested. I am not treating those as defects. See docs/methodology/REGISTRY.md:L2305-L2305.
    Concrete fix: none.

Code Quality

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The two P1 issues above are not tracked in TODO.md, so they remain unmitigated blockers rather than accepted deferred work.

Security

No findings.

Documentation/Tests

  • Severity P3. Impact: the tests and PR summary overstate the parity coverage. TestStageDiagnosticsParity says it checks per-stage (V, B1, B2, R), but it only asserts V, B1, and B2; the only R assertion is the trivial scale=0 case for stage_d1, so regressions in stage_d2_R, stage_b_R, or stage_h_R could slip through. See tests/test_bandwidth_selector.py:L76-L117 and tests/test_nprobust_port.py:L222-L243.
    Concrete fix: add golden assertions for R on all four stages and add a direct rank-deficient duplicate-support test so the new error contract is actually covered.

Path to Approval

  1. Add a design-applicability guard in mse_optimal_bandwidth() that rejects d̲ > 0 lower-boundary mass-point inputs and points users to the Design 1 mass-point 2SLS path.
  2. Add full-rank validation around every qrXXinv() call in lprobust_bw() (or catch/rethrow LinAlgError) so duplicate-support windows fail with a clear ValueError.
  3. Extend the tests to assert stage R parity from the golden JSON and to cover the rank-deficient duplicate-support case explicitly.

P1 #1 (methodology): mse_optimal_bandwidth now rejects Design 1
mass-point designs. When boundary > 0 and the modal fraction at
d.min() exceeds the REGISTRY-specified 2% threshold, raise
NotImplementedError pointing to the 2SLS sample-average estimator
per de Chaisemartin et al. (2026) Section 3.2.4. Design 1' with
untreated units at d=0 (boundary=0) is still accepted per Garrett
et al. (2020) application precedent.

P1 #2 (code quality): qrXXinv now catches np.linalg.LinAlgError from
Cholesky and re-raises as ValueError with a targeted message naming
the failing dimension and suggesting remediation. Duplicate-support
windows or other rank-deficient designs now fail with a clear error
instead of leaking LinAlgError out of the port.

P3 (tests): Added TestStageDiagnosticsParity::test_R_parity covering
all four stages. Previously only V/B1/B2 were pinned; R (BWreg) was
only trivially checked for stage_d1 (scale=0 -> R=0). Now stage_b
and stage_h R values are explicitly parity-tested at 1% against R
nprobust.

New behavioral tests:
- test_mass_point_design_rejected: 10% mass at 0.1 -> NotImplementedError
- test_continuous_near_d_lower_accepted: uniform(0.1, 1.0) passes
- test_untreated_at_zero_accepted: 15% at d=0 with boundary=0 passes
- test_rank_deficient_design_raises_valueerror: rank-1 X -> ValueError
- R parity on all four stages across 3 DGPs (12 new parametrized cases)

169 tests pass (up from 153).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 03532fc0b22ec2e07687bed7393482b689c46c72


Overall Assessment

⚠️ Needs changes — one unmitigated P1 methodology issue remains in the new public bandwidth-selector path.

Executive Summary

  • The prior interior/upper-boundary issue is fixed: the public wrapper now rejects interior and upper-boundary evaluation points.
  • The prior raw Cholesky failure issue is largely fixed: qrXXinv() now re-raises rank-deficient X'X failures as a clear ValueError, and lprobust_bw() has stage-specific minimum-count guards.
  • The prior parity-coverage gap for stage R is fixed: the end-to-end parity suite now asserts R on all four DPI stages.
  • One P1 remains: the new mass-point guard only runs when boundary > 0, so the default boundary=0.0 path can still silently send d_min > 0 mass-point designs down the wrong nonparametric Design 1′ branch.
  • weights= unsupported and the public wrapper’s restricted HAD-only scope are documented deviations in the registry and are not defects.
  • I could not run the test suite in this environment because pytest and numpy are unavailable, so this is a static re-review.

Methodology

  • Severity P1. Impact: the prior mass-point applicability finding is only partially resolved. The new guard in mse_optimal_bandwidth() only checks for bunching when boundary > _boundary_tol, so it catches boundary=d_min>0 inputs but not the default boundary=0.0 call path. As a result, a dataset with d_min > 0 and >2% mass at min(d) can still silently receive a Design 1′ CCF bandwidth instead of the Design 1 mass-point 2SLS redirection required by the registry’s split between Design 1′, Design 1 mass-point, and Design 1 continuous-near-. See mse_optimal_bandwidth() in diff_diff/local_linear.py:L608, especially the positive-boundary-only mass-point branch at diff_diff/local_linear.py:L646, the default boundary=0.0 API at diff_diff/local_linear.py:L535, the acceptance test for boundary=0 below min(d) at tests/test_bandwidth_selector.py:L315, and the methodology split at docs/methodology/REGISTRY.md:L2172, docs/methodology/REGISTRY.md:L2228, and docs/methodology/REGISTRY.md:L2246. Concrete fix: make the default boundary=0.0 path classify or reject d_min>0 inputs instead of silently treating them as Design 1′. The cleanest fix is to apply the registry’s planned rule here: if min(d) < 0.01 * median(d), allow Design 1′; else if modal-min fraction >2%, raise the mass-point 2SLS NotImplementedError; else require boundary=d.min() for the continuous-near- path.

  • Severity P3. Impact: none. The Phase 1b deviations are properly documented: weights= is unsupported, and the exported wrapper is intentionally limited to p=1, deriv=0, interior=False, vce="nn", nnmatch=3, while broader private-path settings are not parity-tested. See docs/methodology/REGISTRY.md:L2305 and the wrapper docstring at diff_diff/local_linear.py:L517. Concrete fix: none.

Code Quality

No findings. The earlier raw LinAlgError path is now converted into a clear ValueError in qrXXinv(), and the stage-specific minimum-count guards are present in lprobust_bw(). See diff_diff/_nprobust_port.py:L111 and diff_diff/_nprobust_port.py:L368.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The blocking methodology issue above is not tracked in TODO.md:L51, so it remains unmitigated.

Security

No findings.

Documentation/Tests

  • Severity P3. Impact: the selector-level regression for rank-deficient duplicate-support windows is still not actually pinned. The new test only proves that qrXXinv() raises on a hand-made singular matrix, not that a duplicate-support dataset flowing through lprobust_bw() / lpbwselect_mse_dpi() / mse_optimal_bandwidth() surfaces the intended ValueError rather than a different failure. See the catch in diff_diff/_nprobust_port.py:L137, the stage calls in diff_diff/_nprobust_port.py:L714, and the current test at tests/test_bandwidth_selector.py:L362. Concrete fix: add an integration test where the sample has enough rows but too few distinct in-window d values for one pilot stage, and assert that lpbwselect_mse_dpi() and the public wrapper raise ValueError.

Execution note: I could not run pytest here because the review environment lacks both pytest and numpy.

Path to Approval

  1. Update mse_optimal_bandwidth() so the default boundary=0.0 path does not silently accept d_min>0 datasets. Either implement the registry’s Design 1′ / mass-point / continuous-near- split in this wrapper, or hard-fail ambiguous boundary=0 inputs with d_min>0 and force the caller to choose the correct design explicitly.
  2. Add a regression test where boundary=0.0, d.min()>0, and the modal-min fraction exceeds 2%; it should raise the mass-point NotImplementedError rather than return a bandwidth.
  3. Add a companion regression test where boundary=0.0, d.min()>0, and there is no mass point; it should either apply the documented Design 1′ heuristic or raise a clear error instructing the caller to pass boundary=d.min().
  4. Add a selector-level duplicate-support test that drives a singular in-window design through lpbwselect_mse_dpi() / mse_optimal_bandwidth() and asserts the new ValueError contract.

… test

P1 (methodology): The boundary=0 default path now classifies inputs
per the REGISTRY-planned design="auto" rule. Previously the
mass-point guard only fired for boundary>0, so a dataset with
d.min()>0 and mass at d.min() would silently pass through the
default-boundary path as Design 1'. The new rule:

  - boundary=0 with min(d) < 0.01 * median(d): Design 1' accepted
    (support infimum effectively at 0).
  - boundary=0 with min(d) >= 0.01 * median(d):
      * modal fraction at min(d) > 2%: mass-point design -> raise
        NotImplementedError pointing to the 2SLS / Phase 2 path.
      * otherwise: ambiguous -> raise ValueError asking the caller
        to pass boundary=d.min() for the Design 1 continuous-
        near-d_lower path.
  - boundary>0 path unchanged (mass-point check already in place).

Removed the stale test_boundary_below_min_d_accepted: it used
U(0.01, 1) data which doesn't satisfy Design 1' under the stricter
rule. Replaced with three targeted tests:

  - test_boundary_zero_design_1_prime_accepted: U(0, 1) passes.
  - test_boundary_zero_with_positive_d_min_rejected: U(0.5, 1)
    raises "Ambiguous design".
  - test_boundary_zero_with_d_min_mass_point_rejected: mass at 0.1
    with boundary=0 raises mass-point NotImplementedError.

P3 (tests): Added test_full_stack_rank_deficient_raises_valueerror
driving a 3-distinct-value dataset through both lpbwselect_mse_dpi
and the public wrapper; both must raise ValueError (or
NotImplementedError), never LinAlgError or IndexError.

172 tests pass (up from 169).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a29a8fa9ffce2f659993a33600c9984c3e484565


Overall Assessment
⚠️ Needs changes — two unmitigated P1 methodology issues remain in the new public bandwidth-selector wrapper.

Executive Summary

  • The previous blocker is fixed: the default boundary=0 path no longer silently treats obvious d_min>0 datasets as Design 1′.
  • The new wrapper now hard-applies the Phase 2 design="auto" heuristic on the Phase 1b public API, which can reject valid Design 1′ finite samples and is not documented in the new Phase 1b registry note.
  • The wrapper still accepts off-support lower boundaries (boundary < d.min()), which are not any documented HAD estimand and can silently target the wrong limit.
  • The nprobust port itself looks faithful, and the documented deviations (weights= unsupported, public API restricted to the HAD Phase 1b surface) are properly recorded.
  • The earlier duplicate-support test gap is only partially closed: the private selector is covered, but the public-wrapper regression still does not pin the intended error contract.
  • Static re-review only: pytest is unavailable in this environment.

Methodology

  • Severity P1. Impact: mse_optimal_bandwidth() now hard-fails the default boundary=0 path whenever d.min() > 0.01 * median(d) and there is no mass point, via the new heuristic at diff_diff/local_linear.py:673 and diff_diff/local_linear.py:693. That rule is the still-unshipped Phase 2 design="auto" heuristic in the registry at docs/methodology/REGISTRY.md:2309, not part of the documented Phase 1b public contract at docs/methodology/REGISTRY.md:2305. Methodologically, the nonparametric Design 1′ path is still the paper’s intended path for support-at-zero designs, including thin-boundary-density simulations like the committed Beta(2,2) DGP class described at benchmarks/R/generate_nprobust_golden.R:195 and discussed at docs/methodology/REGISTRY.md:2231. Hard-rejecting those samples in Phase 1b is an undocumented default-behavior change and can produce production errors on valid inputs. Concrete fix: do not make the 1%/2% heuristic a hard gate on the Phase 1b public API unless you also add an explicit design=/override contract and document it in REGISTRY.md; otherwise, boundary=0 needs a supported explicit Design 1′ path.

  • Severity P1. Impact: boundary validation is still too loose. The wrapper only rejects boundary > d.min() at diff_diff/local_linear.py:625, so any lower off-support boundary still passes and is forwarded directly to lpbwselect_mse_dpi(..., eval_point=boundary) at diff_diff/local_linear.py:708. Downstream, local_linear_fit will also use that same off-support boundary with the one-sided kernel at diff_diff/local_linear.py:347. For HAD, the documented nonparametric estimands are at 0 (Design 1′) or at d̲ = min(D_2) (Design 1 continuous-near-) per docs/methodology/REGISTRY.md:2179 and docs/methodology/REGISTRY.md:2240. Accepting boundary < d.min() is therefore an undocumented off-support extrapolation path that can silently return the wrong bandwidth target. Concrete fix: require boundary to equal 0 or d.min() within tolerance, and reject all other off-support values with a targeted error.

  • Severity P3. Impact: none. The Phase 1b deviations for unsupported weights= and the restricted HAD-only public surface are documented at docs/methodology/REGISTRY.md:2305 and in the wrapper docstring starting at diff_diff/local_linear.py:517. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings beyond the documented Phase 1b deferrals already recorded in the registry.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the duplicate-support regression is still only partially pinned at the public-wrapper level. The private path is now exercised, but test_full_stack_rank_deficient_raises_valueerror() in tests/test_bandwidth_selector.py:397 allows either ValueError or NotImplementedError, so the new design guard can short-circuit before the wrapper actually exercises the qrXXinv/lprobust_bw failure contract. Concrete fix: add a public-wrapper regression using boundary=float(d.min()) on a continuous-near- duplicate-support sample and assert the specific ValueError contract.

  • Execution note: pytest is not installed in this review environment, so this re-review is static.

Path to Approval

  1. Decide and document the public contract for boundary=0 on d.min()>0 samples. If you want auto-classification, expose an explicit override (for example design=) and record the heuristic in REGISTRY.md; otherwise, remove the hard Phase 2 heuristic from the Phase 1b wrapper.
  2. Tighten boundary validation so the public wrapper accepts only boundary≈0 or boundary≈d.min(), and rejects any other lower off-support value.
  3. Add a regression test that an off-support lower boundary is rejected, e.g. d ~ U(1,2) with boundary=0.5.
  4. Add a regression test that covers the chosen boundary=0 contract on a thin-boundary-density Design 1′ sample, so valid support-at-zero inputs are not accidentally blocked.
  5. Add a wrapper-level duplicate-support regression with boundary=float(d.min()) and assert the specific ValueError surface from the selector path.

Reviewer correctly flagged that the 1%-of-median rule is a Phase 2
design="auto" heuristic, not Phase 1b. Backed off that over-reach.

P1 #1: Removed the min(d)/median(d) < 0.01 check. The mass-point
guard now applies uniformly (whenever d.min() > 0 and modal fraction
at d.min() > 2%) and does not gate on boundary. This still catches
the original concern (silently routing mass-point data through the
nonparametric branch) without rejecting valid Design 1' samples like
Beta(2,2) where d.min() is strictly positive but small.

P1 #2: Tightened boundary validation. The wrapper now accepts only
boundary ~ 0 (Design 1') or boundary ~ d.min() (Design 1 continuous-
near-d_lower) within float tolerance. Off-support values -- including
the previously-allowed "boundary < d.min()" path -- are rejected with
a targeted error message.

P3: Added a public-wrapper duplicate-support regression that drives a
rank-deficient X'X through the full selector stack (boundary =
d.min(), unique minimum, only 4 distinct d values) and asserts a
specific "qrXXinv" ValueError, not LinAlgError.

Test updates:
- Removed test_boundary_zero_with_positive_d_min_rejected: the case
  it modeled is now accepted (no mass point).
- Added test_boundary_zero_thin_boundary_density_accepted: Beta(2,2)
  Design 1' with vanishing boundary density now passes.
- Added test_off_support_boundary_rejected: boundary=0.5 on U(1,2).
- Added test_negative_boundary_rejected: boundary<0 rejected.
- Updated test_nonzero_boundary: uses boundary=float(d.min()), not
  boundary=1.0 (which is off the realized support of U(1,2)).

175 tests pass (up from 172).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8d357b6804a7b4641a1d080743a01f2f074679cd


Overall Assessment

Looks good — no unmitigated P0/P1 findings in the current diff.

Executive Summary

Methodology

  • No unmitigated P0/P1 findings.
  • Severity P3 (informational). Impact: none. The intentional Phase 1b deviations are documented: weights= is unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, those are documented deviations, not defects. Concrete fix: none. docs/methodology/REGISTRY.md#L2310 diff_diff/local_linear.py#L517

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech debt found in the changed files. The remaining Phase 1b scope limitations are documented in the methodology registry rather than being silently deferred. docs/methodology/REGISTRY.md#L2310

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the public API docs understate the actual error contract of mse_optimal_bandwidth(). The implementation now raises ValueError for unsupported boundary choices and NotImplementedError for detected mass-point designs, but the docstring Raises section still only mentions shape/non-finite/kernel/weights failures. Concrete fix: extend the Raises section in diff_diff/local_linear.py#L564 to cover the boundary rejection at diff_diff/local_linear.py#L622 and the mass-point redirect at diff_diff/local_linear.py#L647; optionally mirror that contract in the Phase 1b registry note. docs/methodology/REGISTRY.md#L2310
  • Static-review note: the new parity and behavioral coverage is substantially better than in the prior rerun, including wrapper-level duplicate-support regression coverage. tests/test_bandwidth_selector.py#L426

CI AI review P3: the docstring only listed shape/finite/kernel/weights
raises but the implementation now also raises on boundary off-support,
mass-point, bwcheck out-of-range, and per-stage rank/count failures.
Document the full contract.

175 tests pass (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7277ec86e8b1dd4122db382aaa2e848bb99d8538


Overall Assessment

Looks good — no unmitigated P0/P1 findings in the current diff.

Executive Summary

Methodology

  • Severity P3 (informational). Impact: none. The intentional Phase 1b deviations are documented: weights= is unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, those are documented deviations rather than defects. Concrete fix: none. docs/methodology/REGISTRY.md:2310 diff_diff/local_linear.py:517

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new untracked tech debt found in the changed files.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the boundary parameter docs no longer match the actual wrapper contract. The docstring still says callers should pass d_lower for continuous-near-d_lower designs, but the implementation and tests now require the sample minimum d.min() for that path; users following the current docs can hit an avoidable ValueError. Concrete fix: rewrite the boundary parameter docs (and nearby Notes text) to say the Phase 1b wrapper expects the sample estimate min(d) for the nonzero-boundary case. diff_diff/local_linear.py:535 diff_diff/local_linear.py:628 tests/test_bandwidth_selector.py:481
  • Severity P3. Impact: the public kernel surface is broader than the end-to-end parity anchor. The R golden generator and parity suite only cover kernel="epa"; triangular and uniform are only smoke-tested for dispatch, so two supported kernels lack the same R-backed regression anchor as the default path. Concrete fix: extend the golden generator and parity tests to at least one tri and one uni scenario. benchmarks/R/generate_nprobust_golden.R:191 tests/test_bandwidth_selector.py:455

CI AI review P3: the boundary parameter docs said callers could pass
d_lower (the theoretical support lower bound), but the implementation
requires boundary = float(d.min()) (the sample minimum). Users
following the old docs on e.g. U(1, 2) data with boundary=1.0 would
hit an avoidable ValueError. Docstring now explicitly says use the
sample minimum.

Deferred (P3): extending R golden parity to triangular and uniform
kernels. All three kernels go through the same lprobust_bw code path,
so epa parity transitively covers dispatch; tri/uni parity is a
nice-to-have regression anchor but not a correctness gap.

175 tests pass (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 044baedd98f55aa192a2b0ce829dd4f7f69c4b02


Overall Assessment

⚠️ Needs changes — one unmitigated P1 methodology issue.

Executive Summary

  • The affected method is the Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selector, plus the new HAD-specific public wrapper around it.
  • The core _nprobust_port translation is well source-mapped, and the explicit Phase 1b deviations (weights= unsupported; public wrapper restricted to p=1, deriv=0, interior=False, vce="nn", nnmatch=3) are documented in docs/methodology/REGISTRY.md:L2310, so I did not count those as defects.
  • Prior rereview P3 on the boundary docs looks resolved: the public docstring now says the nonzero-boundary path expects d.min().
  • P1: mse_optimal_bandwidth() never enforces the HAD support restriction D_{g,2} >= 0. Negative-dose samples can still pass through both accepted boundary branches and drive the selector on a non-HAD problem with no warning. diff_diff/local_linear.py:L625-L677, docs/methodology/REGISTRY.md:L2135-L2145
  • P3: the R-backed parity harness is still limited to kernel="epa" and eval_point=0; triangular, uniform, and the boundary=d.min() path are smoke-tested only. benchmarks/R/generate_nprobust_golden.R:L191-L199, tests/test_bandwidth_selector.py:L455-L491
  • Static review only: I could not execute the new suite here because this environment is missing numpy.

Methodology

Affected methods: diff_diff._nprobust_port.lpbwselect_mse_dpi and the HAD public wrapper diff_diff.local_linear.mse_optimal_bandwidth.

  • Severity P1. Impact: the public wrapper is documented as HAD-only, but it never checks the source-material requirement that period-2 doses satisfy D_{g,2} >= 0. If any d < 0 slips in, boundary=0 is accepted via _at_zero, and the symmetric nprobust kernel code can calibrate h_mse on an interior two-sided problem while the downstream fitter is explicitly one-sided on [boundary, boundary + h]; boundary=float(d.min()) also passes even though a negative is outside the documented HAD design. That is an unguarded assumption violation and can yield a plausible bandwidth with no warning. Concrete fix: add a front-door nonnegativity check in mse_optimal_bandwidth() before the _at_zero / _at_d_min routing, and add regression tests for negative-dose input under both boundary branches. diff_diff/local_linear.py:L66-L68, diff_diff/local_linear.py:L280-L297, diff_diff/local_linear.py:L625-L677, diff_diff/_nprobust_port.py:L78-L103, docs/methodology/REGISTRY.md:L2135-L2145, docs/methodology/papers/dechaisemartin-2026-review.md:L18-L29
  • Severity P3 (informational). Impact: none. The two intentional deviations called out by the PR are documented in the registry: weights= is unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, I did not count those as defects. Concrete fix: none. docs/methodology/REGISTRY.md:L2310, diff_diff/local_linear.py:L517-L527, diff_diff/local_linear.py:L547-L550

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new blocker-level tech debt found in the changed files.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the committed golden/parity harness still anchors only the default kernel="epa" / eval_point=0 path. The additional supported public branches kernel in {"triangular","uniform"} and boundary=float(d.min()) are only smoke-tested, so regressions there would not be caught by the R-backed reference suite. Concrete fix: extend benchmarks/R/generate_nprobust_golden.R with at least one tri, one uni, and one shifted-boundary scenario, regenerate benchmarks/data/nprobust_mse_dpi_golden.json, and assert those values in tests/test_bandwidth_selector.py. benchmarks/R/generate_nprobust_golden.R:L191-L199, tests/test_bandwidth_selector.py:L47-L73, tests/test_bandwidth_selector.py:L455-L491
  • Severity P3. Impact: _nprobust_port.py still says weighted data can be handled by the public wrapper, but the new public wrapper explicitly raises NotImplementedError on weights=. Concrete fix: align the private-module docstring with the public API and the registry note. diff_diff/_nprobust_port.py:L34-L36, diff_diff/local_linear.py:L547-L550, diff_diff/local_linear.py:L599-L604
  • Static review only: I could not run pytest here because the environment lacks numpy.

Path to Approval

  1. In diff_diff/local_linear.py, reject negative-dose input in mse_optimal_bandwidth() before the boundary/design routing so the HAD-only wrapper cannot run on D_{g,2} < 0 data.
  2. Add regression tests that assert negative d raises under both boundary=0.0 and boundary=float(d.min()).

CI AI review P1: mse_optimal_bandwidth did not enforce the HAD
support requirement D_{g,2} >= 0. Negative-dose samples could
silently pass through both boundary branches: boundary=0 (accepted
via _at_zero even with d_min<0) and boundary=float(d.min()) (accepts
any lower edge). The symmetric nprobust kernel would happily
calibrate a two-sided interior bandwidth while the downstream
one-sided fitter runs on [boundary, boundary+h] -- silent
assumption violation.

Front-door check added: np.any(d < 0) raises ValueError with a
message citing the paper's support assumption.

Two new regression tests:
- test_negative_dose_rejected_boundary_zero: d ~ U(-0.5, 0.5) with
  boundary=0 raises.
- test_negative_dose_rejected_boundary_at_d_min: d ~ U(-1, -0.1)
  with boundary=d.min() raises.

Deferred (P3 same as last round): tri/uni/shifted-boundary golden
parity extension. All three kernels share lprobust_bw, so epa parity
transitively covers kernel dispatch.

177 tests pass (up from 175).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 196f91040a07bc19e7b3e8b07bb4bf47d0a919f1


Overall Assessment

⚠️ Needs changes — two unmitigated P1 findings remain.

Executive Summary

Methodology

Code Quality

  • Severity P1. Impact: the new wrapper does not handle empty inputs explicitly. With d=np.array([]), y=np.array([]), shape and finiteness checks pass, then the code falls into d.min() / d.max() and throws a low-level reduction error instead of a targeted validation failure. That is a production edge-case miss under the project’s empty-result-set checklist. Concrete fix: add a front-door if d.size == 0: raise ValueError("d and y must be non-empty") before any reductions, and add a regression test for empty arrays. diff_diff/local_linear.py:614, diff_diff/local_linear.py:654

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the golden/parity harness still only anchors the default kernel="epa" / eval_point=0 path. The additional public branches kernel in {"triangular","uniform"} and boundary=float(d.min()) are only smoke-tested, so a future regression there would not be caught by the R-backed reference suite. Concrete fix: extend benchmarks/R/generate_nprobust_golden.R with at least one tri, one uni, and one shifted-boundary scenario, regenerate benchmarks/data/nprobust_mse_dpi_golden.json, and assert those cases in tests/test_bandwidth_selector.py. benchmarks/R/generate_nprobust_golden.R:191, tests/test_bandwidth_selector.py:476, tests/test_bandwidth_selector.py:498
  • Severity P3. Impact: _nprobust_port.py still says weighted data can be handled by the public wrapper, but the public wrapper explicitly raises NotImplementedError on weights=. That leaves the private-module docs out of sync with the actual API and with the registry note. Concrete fix: change the _nprobust_port module docstring to say the public wrapper does not support weights in Phase 1b. diff_diff/_nprobust_port.py:34, diff_diff/local_linear.py:547, docs/methodology/REGISTRY.md:2310
  • Static review only: this environment lacks numpy and pytest, so I could not run the new selector tests.

Path to Approval

  1. In mse_optimal_bandwidth(), reject boundary≈0 whenever d.min() is not also approximately zero. For d.min()>0, direct callers either to boundary=float(d.min()) (continuous-near-) or to the future mass-point 2SLS path. Add a regression test with continuous d ~ Uniform(0.01, 1.0) that must fail front-door at boundary=0.
  2. Add an explicit non-empty input guard before d.min() / d.max(), and add a regression test on empty d/y asserting a targeted ValueError.

P1 #1: boundary=0 now enforces a Design 1' support plausibility
heuristic: d.min() <= 5% * median(|d|). Samples with d.min()
substantially positive (e.g. U(0.5, 1)) are rejected with ValueError
directing the caller to boundary=float(d.min()). Threshold chosen
at 5% (not REGISTRY's 1%) so the paper's thin-boundary-density
DGPs (Beta(2,2), d.min/median ~ 3%) still pass. Reordered so the
mass-point check (NotImplementedError, paper Section 3.2.4) fires
before the support-check -- mass-point data should be redirected
to 2SLS regardless of the boundary the caller picked.

P1 #2: Empty-input front-door guard. d.size == 0 raises ValueError
with a targeted "must be non-empty" message instead of leaking
the NumPy reduction error from d.min().

P3 (docstring sync): _nprobust_port module docstring no longer says
weighted data can be handled by the public wrapper -- the wrapper
explicitly raises NotImplementedError. Docstring now matches the
actual contract.

P3 (deferred, same as last round): tri/uni/shifted-boundary golden
parity extension.

REGISTRY.md Phase 1b note expanded to document the full input
contract (nonnegativity, boundary applicability, Design 1' support
heuristic, mass-point redirection) so the public API surface is
fully specified in the methodology registry.

178 tests pass (up from 177).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a313e05b17d95d0d23a37d28577268a5d0b1cda4


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

Methodology

Code Quality

  • Severity P1 [Newly identified]. Impact: lpbwselect_mse_dpi() accepts x, y, and cluster without validating dimensionality or matching lengths, then default vce="nn" logic sorts x and applies the same integer indexer to y and cluster. If y or cluster is longer than x, the extra entries are silently dropped and the bandwidth is computed on misaligned data with no warning. Because both the wrapper docstring and the registry explicitly direct advanced callers to use this port for the broader surface, this is a real API-contract bug, not just an internal-helper quirk. Concrete fix: at the top of lpbwselect_mse_dpi(), ravel() x/y, reject empty inputs, require x.shape == y.shape, require finite values, and require cluster is None or cluster.shape == x.shape before the argsort block; add regression tests for len(y) > len(x), len(cluster) > len(x), and empty direct-port inputs. diff_diff/_nprobust_port.py:L602-L683, diff_diff/local_linear.py:L517-L527, docs/methodology/REGISTRY.md:L2310, tests/test_bandwidth_selector.py:L208-L284

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

  • Severity P3. Impact: the committed golden generator only emits kernel="epa" / eval_point=0 scenarios, while the non-default public branches (triangular, uniform, boundary=float(d.min())) are only behaviorally smoke-tested in Python. A regression in those branches would not be caught by the R-backed parity suite. Concrete fix: extend benchmarks/R/generate_nprobust_golden.R with at least one triangular case, one uniform case, and one shifted-boundary case; regenerate benchmarks/data/nprobust_mse_dpi_golden.json; and assert those cases in tests/test_bandwidth_selector.py. benchmarks/R/generate_nprobust_golden.R:L180-L199, tests/test_bandwidth_selector.py:L480-L515

Static review only: numpy and pytest are unavailable in this environment, so I could not execute tests/test_nprobust_port.py or tests/test_bandwidth_selector.py.

Path to Approval

  1. Add front-door contract validation to diff_diff._nprobust_port.lpbwselect_mse_dpi() for 1-D, non-empty, finite, same-length x/y, plus same-length cluster when provided, before the argsort / reindex block.
  2. Add regression tests that cover the current failure modes: y longer than x with default vce="nn", cluster longer than x with default vce="nn", and empty direct-port inputs; assert targeted ValueErrors instead of silent truncation or raw NumPy errors.

CI AI review P1: the port's lpbwselect_mse_dpi is advertised as the
advanced-use entry point for callers outside the HAD Phase 1b
wrapper surface. It accepted x, y, cluster without validating
shapes, and the default vce="nn" branch reindexed y/cluster by
argsort(x) -- so a longer y would be silently truncated to match
x's length, producing a bandwidth on misaligned data with no
warning.

Added front-door validation at the top of lpbwselect_mse_dpi:
- x and y ravel()ed and required to have the same shape
- x must be non-empty
- x, y, and eval_point must be finite
- cluster (if supplied) must match x.shape

Seven new tests in TestLpbwselectMseDpiValidation:
- test_mismatched_shapes_raise
- test_longer_y_silent_truncation_rejected (regression for the
  specific nn-reindex bug)
- test_cluster_wrong_length_rejected
- test_empty_direct_port_input_rejected
- test_non_finite_x_rejected
- test_non_finite_y_rejected
- test_non_finite_eval_point_rejected

185 tests pass (up from 178).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

Prior commit moved shape / emptiness / finiteness checks into the
port but left bwcheck validation above them. As a result, empty or
non-finite inputs got "bwcheck exceeds sample size" errors instead
of the targeted contract messages the tests expect. Reorder so
input-shape validation runs first, then bwcheck, then kernel/vce.
Drop duplicate N assignment.

185 tests pass (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5de3634d36b0ceb2da917a82f320cd3fbd26faa9


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

  • Affected methods: the PR adds the Calonico-Cattaneo-Farrell MSE-DPI boundary-bandwidth selector, exposed via diff_diff/local_linear.py:L495 and implemented in diff_diff/_nprobust_port.py:L602, for the HAD setting documented in the Methodology Registry and de Chaisemartin et al. citeturn0search2turn2view0
  • The prior rerereview blocker is fixed: _nprobust_port.lpbwselect_mse_dpi() now rejects mismatched/empty x/y and wrong-length cluster before the argsort/reindex path, with regression coverage added. diff_diff/_nprobust_port.py:L640, tests/test_nprobust_port.py:L251
  • Severity P1 [Newly identified]: the advanced-use clustered path still has incomplete missing-data handling. cluster is only shape-checked, but clustered meat construction later groups by cluster == c; missing cluster IDs can therefore alter clustered stage-V calculations and the selected bandwidth instead of being dropped like nprobust or rejected up front. diff_diff/_nprobust_port.py:L658, diff_diff/_nprobust_port.py:L301. citeturn3search0
  • The intentional deviations called out in the PR are properly documented: weights= remains unsupported, and the exported wrapper is intentionally restricted to the HAD Phase 1b surface. Under the review rubric, those are P3 informational, not blockers. docs/methodology/REGISTRY.md:L2310, diff_diff/local_linear.py:L495. citeturn0search2
  • Static review only: pytest and numpy are unavailable in this environment, so I could not execute the new test suite.

Methodology

  • Severity P1 [Newly identified]. Impact: the new front-door validation in diff_diff/_nprobust_port.py:L658 enforces shape/emptiness/finiteness for x and y, but not missingness for cluster. Once a clustered call reaches diff_diff/_nprobust_port.py:L301, rows are assigned to blocks via cluster == c; missing cluster IDs can therefore either drop out of every block or form a spurious cluster, silently changing the clustered meat and hence the DPI bandwidth. The reference nprobust::lpbwselect() does complete-case filtering on x, y, and cluster before dispatch, so this is an undocumented source-behavior mismatch on the documented advanced-use path. Concrete fix: in lpbwselect_mse_dpi(), either mirror nprobust’s complete-case filtering for x/y/cluster before sorting, or reject missing cluster IDs up front with a targeted ValueError; then add a regression test for clustered calls with missing cluster labels. diff_diff/_nprobust_port.py:L602, diff_diff/_nprobust_port.py:L658, diff_diff/_nprobust_port.py:L301. citeturn3search0
  • Severity P3. Impact: none. The wrapper’s HAD-only scope restrictions and weights= rejection are explicitly documented in the Registry and docstrings, so they are not methodology defects under the review rubric. Concrete fix: none. docs/methodology/REGISTRY.md:L2310, diff_diff/local_linear.py:L495. citeturn0search2

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

  • Severity P3. Impact: the new validation tests cover wrong-length cluster, but they do not pin the missing-cluster case that still breaks source parity on the clustered advanced-use path, so the blocker above would regress unnoticed. Concrete fix: add direct-port tests with clustered calls where cluster contains np.nan and None, and assert either R-style complete-case dropping or the documented targeted error. tests/test_nprobust_port.py:L251
  • Static review only: I could not run tests/test_nprobust_port.py or tests/test_bandwidth_selector.py because pytest/numpy are not installed in this environment.

Path to Approval

  1. In diff_diff/_nprobust_port.py:L602, handle missing cluster IDs before any sort/reindex or clustered lprobust_vce() call. Preferred: mirror nprobust complete-case filtering. Acceptable fallback: reject missing cluster IDs explicitly.
  2. Add regression coverage for clustered advanced-use calls with missing cluster labels in tests/test_nprobust_port.py:L251.
  3. If the project intentionally chooses stricter erroring instead of nprobust-style row dropping, document that deviation in docs/methodology/REGISTRY.md and the _nprobust_port.py docstring so it is no longer an undocumented behavior mismatch.

CI AI review P1: the advanced-use lpbwselect_mse_dpi accepted
cluster arrays with NaN / None silently. Downstream, clustered
meat construction groups rows via cluster == c; NaN compares
unequal to itself, so missing-cluster rows would drop from every
block and the meat matrix would change silently, altering the
selected bandwidth.

nprobust::lpbwselect does complete-case filtering on (x, y, cluster)
before dispatch. This port deliberately chooses the stricter
"reject" policy consistent with the library's no-silent-failures
convention: missing cluster IDs raise ValueError up front. The
"reject vs. filter" deviation from nprobust is documented in the
module docstring Deviations list.

Handles both NaN in float arrays and None in object-dtype arrays.
Two new regression tests:
- test_missing_cluster_id_rejected_nan
- test_missing_cluster_id_rejected_none

187 tests pass (up from 185).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: aca17caccad0f3260d3720798ac9028497aad592


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains.

Executive Summary

  • Affected method: the PR adds the Calonico-Cattaneo-Farrell MSE-DPI bandwidth selector via diff_diff.mse_optimal_bandwidth and the new private port in diff_diff/local_linear.py:L495 and diff_diff/_nprobust_port.py:L610.
  • Re-review status: the prior missing-cluster-ID finding appears fixed. The port now rejects wrong-length and missing cluster inputs up front in diff_diff/_nprobust_port.py:L666, with regression coverage in tests/test_nprobust_port.py:L320.
  • Severity P1 [Newly identified]: the clustered lprobust_vce() path is not a faithful port of the cited nprobust source. The Python code returns w * M in diff_diff/_nprobust_port.py:L351, while the cited R source computes the same finite-sample factor w but returns M; lprobust.bw uses that result directly in the stage-V calculations. This changes clustered pilot/final bandwidths for direct _nprobust_port callers and future clustered integrations. (rdrr.io)
  • The documented Phase 1b deviations are properly recorded and are not defects under the rubric: weights= rejection, HAD-only public wrapper scope, and the wrapper’s input-contract heuristics are all explicitly noted in docs/methodology/REGISTRY.md:L2310.
  • Static review only: I could not run the new tests in this environment because numpy/pytest are not installed.

Methodology

  • Severity P1 [Newly identified]. Impact: the clustered advanced-use path in lprobust_vce() does not match the cited nprobust implementation. In the PR, the clustered branch computes w and returns w * M at diff_diff/_nprobust_port.py:L351. In the cited nprobust source, lprobust.vce computes the same w, accumulates M, and returns M; lprobust.bw then plugs that return value directly into V.V and V.B. Because diff_diff/_nprobust_port.py:L471 and diff_diff/_nprobust_port.py:L537 feed lprobust_vce() straight into the DPI stages, clustered calls to _nprobust_port.lpbwselect_mse_dpi(..., cluster=...) will silently produce different stage variances and bandwidths from the claimed reference implementation. The public Phase 1b wrapper is unaffected today because it hard-codes cluster=None at diff_diff/local_linear.py:L742. Concrete fix: either make the clustered branch match the cited source exactly, or explicitly document the clustered finite-sample correction as a **Note (deviation from R):** in the registry and module docstring and narrow the “faithful/exact port” wording accordingly. (rdrr.io)
  • Severity P3. Impact: none. The wrapper’s weights= rejection, HAD-only exported scope, and boundary/input-contract restrictions are explicitly documented in the Phase 1b registry entry, so they are informational rather than defects under this rubric. Concrete fix: none. See docs/methodology/REGISTRY.md:L2310 and diff_diff/local_linear.py:L517.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not find a TODO.md entry that already tracks the clustered lprobust_vce mismatch.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Resolve the clustered lprobust_vce() mismatch in diff_diff/_nprobust_port.py:L351: either match the cited nprobust source exactly, or explicitly record the clustered finite-sample scaling as a **Note (deviation from R):** in docs/methodology/REGISTRY.md:L2310 and the module docstring.
  2. Add a clustered regression/parity test that exercises cluster != None through the DPI path so this variance/bandwidth behavior is pinned, rather than relying only on vce="nn" golden tests and cluster-missing validation.
  3. Update the stale 1% comment in tests/test_bandwidth_selector.py:L337 to the implemented 5% rule.

CI AI review P1: my port of lprobust.vce multiplied the accumulated
cluster meat by w = ((n-1)/(n-k)) * (g/(g-1)). But the R source
(npfunctions.R:165-185) computes w but does NOT apply it to the
returned M -- w is dead code in the R function. My w factor would
silently inflate clustered stage variances and drift the DPI
bandwidths away from nprobust.

The public wrapper is unaffected (it hard-codes cluster=None), but
the documented advanced-use entry point lpbwselect_mse_dpi supports
cluster= and would have produced non-faithful results.

Fix: drop the w multiplier so lprobust_vce returns the raw
sum-of-outer-products of cluster scores, matching R. A comment
documents that w is computed-but-unused in R so future maintainers
don't reintroduce the multiplier thinking it was a port omission.

Three new tests in TestLprobustVceClustered:
- test_clustered_meat_matches_unscaled_sum: bit-exact vs manual
  sum without the w factor
- test_clustered_is_symmetric: meat is symmetric
- test_clustered_end_to_end_through_lprobust_bw: smoke-tests the
  clustered DPI path end-to-end

P3 (stale comment): test_boundary_zero_with_data_far_from_zero_rejected
docstring and comment referenced "1% of median" but the implemented
rule is 5%. Aligned with the code.

P3 (same as last 4 rounds): tri/uni kernel golden parity extension.

190 tests pass (up from 187).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: be2a6438b98b8b34d841c2b5a0994734a4c37081


Overall Assessment

Looks good — no unmitigated P0/P1 findings.

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not identify any remaining issue in this PR that needed downgrade via the review-debt tracker in TODO.md:51.

Security

  • No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026
Previous phrasing "parity verified at 0.0000%" reads as "0% agreement"
at a glance. What we actually have is 0.0000% relative error -- i.e.,
bit-parity within float64 precision. Clearer wording.

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@igerber igerber removed the ready-for-ci Triggers CI test workflows label Apr 19, 2026
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 795b461f9b19fb3d097180e7aba395ef50821cd3


Overall Assessment

Looks good — no unmitigated P0/P1 findings.

Executive Summary

  • Re-review focus: the prior clustered lprobust_vce() issue appears resolved. The clustered branch in diff_diff/_nprobust_port.py#L338 now returns the unscaled cluster-summed meat, matching the nprobust source where the finite-sample factor w is computed but not applied to M. (rdrr.io)
  • Affected methods: this PR adds the Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selector in the private port diff_diff/_nprobust_port.py#L610 and the HAD-facing public wrapper diff_diff/local_linear.py#L495. The staged q+1, q+2, q, p call structure and the closed-form odd-(p-deriv) branch match the reference nprobust implementation and the paper’s Design 1′ / Design 1 split. (arxiv.org)
  • The wrapper’s extra guardrails (weights rejection, boundary restriction, mass-point redirection, Design 1′ plausibility heuristic) are explicitly documented in docs/methodology/REGISTRY.md#L2310, so under the review rubric they are informational deviations, not blockers.
  • Edge-case coverage is materially stronger. The PR adds targeted validation for empty/non-finite inputs, bwcheck range, boundary applicability, rank-deficient pilot windows, and clustered-meat regression anchors in tests/test_bandwidth_selector.py#L167 and tests/test_nprobust_port.py#L247.
  • Static review only: I could not execute the suite here because numpy is not installed in this environment.

Methodology

  • Severity P3. Impact: none. The public wrapper in diff_diff/local_linear.py#L495 deliberately narrows raw nprobust behavior to the documented HAD Phase 1b surface: weights unsupported, boundary limited to 0 or d.min(), mass-point cases redirected away from the nonparametric path, and an extra Design 1′ support heuristic. Those choices are explicitly recorded in docs/methodology/REGISTRY.md#L2310, so they are informational rather than defects under the rubric. The underlying method split itself is consistent with the paper’s Design 1′ / mass-point / continuous-near- cases and with nprobust::lpbwselect.mse.dpi. Concrete fix: none. (arxiv.org)
  • No other methodology findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not identify any new deferrable item from this PR that needed to be added to TODO.md#L51.

Security

  • No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026
CI's "Copy tests to isolated location" step runs tests against the
installed package from /private/tmp/tests/, but does NOT copy
benchmarks/ alongside. My parity tests hard-failed with
FileNotFoundError when the golden JSON was not present at the
expected relative path.

Follow the established repo convention for golden-value fixtures:

    if not GOLDEN_PATH.exists():
        pytest.skip("Golden values file not found; run: Rscript ...")

Matches the pattern used by test_csdid_ported.py,
test_chaisemartin_dhaultfoeuille_parity.py, test_survey_real_data.py,
test_survey_estimator_validation.py, and test_linalg_hc2_bm.py.

Parity tests still run as hard gates:
- when invoked from the repo root (local /pre-merge-check,
  /ai-review-local, dev iteration)
- when benchmarks/ is present alongside tests/ in CI jobs

They skip in the isolated-install job where only tests/ is copied.
Net effect matches the shipping convention for all other
R-backed parity suites in this repo.

190 tests pass (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber merged commit ed94e43 into main Apr 19, 2026
19 checks passed
@igerber igerber deleted the did-no-untreated branch April 19, 2026 21:45
igerber added a commit that referenced this pull request Apr 20, 2026
…-failures audit

Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per
project SemVer convention, minor bumps are reserved for new estimators or new
module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults
(PR #318) add a new public API surface and drive this bump.

Headline work:
- PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner-
  ready output layer. Plain-English narrative summaries across all 16 result types,
  with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md.
- PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear
  regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the
  upcoming HeterogeneousAdoptionDiD estimator.
- PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A
  contract), heterogeneity + within-group-varying PSU under Binder TSL, and
  PSU-level Hall-Mammen wild bootstrap at cell granularity.
- PR #333 performance review - docs/performance-scenarios.md documents 5-7
  realistic practitioner workflows; benchmark harness extended.

Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339)
continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J.

CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default
CI after false-positive flakes; perf-review harness is the principled replacement.

Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml,
diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released:
2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the
comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers;
the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant